Search CORE

99 research outputs found

A Repetition Test for Pseudo-Random Number Generators

Author: Gil Manuel
Gonnet Gaston H.
Petersen Wesley P.
Publication venue
Publication date: 02/08/2017
Field of study

A new statistical test for uniform pseudo-random number generators (PRNGs) is presented. The idea is that a sequence of pseudo-random numbers should have numbers reappear with a certain probability. The expectation time that a repetition occurs provides the metric for the test. For linear congruential generators (LCGs) failure can be shown theoretically. Empirical test results for a number of commonly used PRNGs are reported, showing that some PRNGs considered to have good statistical properties fail. A sample implementation of the test is provided over the Interne

RERO DOC Digital Library

Empirical codon substitution matrix

Author: Cannarozzi Gina M
Gonnet Gaston H
Schneider Adrian
Publication venue: BioMed Central
Publication date: 01/06/2005
Field of study

BACKGROUND: Codon substitution probabilities are used in many types of molecular evolution studies such as determining Ka/Ks ratios, creating ancestral DNA sequences or aligning coding DNA. Until the recent dramatic increase in genomic data enabled construction of empirical matrices, researchers relied on parameterized models of codon evolution. Here we present the first empirical codon substitution matrix entirely built from alignments of coding sequences from vertebrate DNA and thus provide an alternative to parameterized models of codon evolution. RESULTS: A set of 17,502 alignments of orthologous sequences from five vertebrate genomes yielded 8.3 million aligned codons from which the number of substitutions between codons were counted. From this data, both a probability matrix and a matrix of similarity scores were computed. They are 64 × 64 matrices describing the substitutions between all codons. Substitutions from sense codons to stop codons are not considered, resulting in block diagonal matrices consisting of 61 × 61 entries for the sense codons and 3 × 3 entries for the stop codons. CONCLUSION: The amount of genomic data currently available allowed for the construction of an empirical codon substitution matrix. However, more sequence data is still needed to construct matrices from different subsets of DNA, specific to kingdoms, evolutionary distance or different amount of synonymous change. Codon mutation matrices have advantages for alignments up to medium evolutionary distances and for usages that require DNA such as ancestral reconstruction of DNA sequences and the calculation of Ka/Ks ratios

Repository for Publications and Research Data

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

OMA Browser—Exploring orthologous relations across 352 complete genomes

Author: Dessimoz Christophe
Gonnet Gaston H.
Schneider Adrian
Publication venue
Publication date: 02/08/2017
Field of study

Motivation: Inference of the evolutionary relation between proteins, in particular the identification of orthologs, is a central problem in comparative genomics. Several large-scale efforts with various methodologies and scope tackle this problem, including OMA (the Orthologous MAtrix project). Results: Based on the results of the OMA project, we introduce here the OMA Browser, a web-based tool allowing the exploration of orthologous relations over 352 complete genomes. Orthologs can be viewed as groups across species, but also at the level of sequence pairs, allowing the distinction among one-to-one, one-to-many and many-to-many orthologs. Availability: http://omabrowser.org Contact: [email protected]

RERO DOC Digital Library

OMA 2011: orthology inference among 1000 complete genomes

Author: Altenhoff Adrian M.
Dessimoz Christophe
Gonnet Gaston H.
Schneider Adrian
Publication venue
Publication date: 02/08/2017
Field of study

OMA (Orthologous MAtrix) is a database that identifies orthologs among publicly available, complete genomes. Initiated in 2004, the project is at its 11th release. It now includes 1000 genomes, making it one of the largest resources of its kind. Here, we describe recent developments in terms of species covered; the algorithmic pipeline—in particular regarding the treatment of alternative splicing, and new features of the web (OMA Browser) and programming interface (SOAP API). In the second part, we review the various representations provided by OMA and their typical applications. The database is publicly accessible at http://omabrowser.or

RERO DOC Digital Library

Detecting non-orthology in the COGs database and other approaches grouping orthologs using genome-specific best hits

Author: Boeckmann Brigitte
Dessimoz Christophe
Gonnet Gaston H.
Roth Alexander C. J.
Publication venue: Oxford University Press
Publication date: 01/01/2006
Field of study

Correct orthology assignment is a critical prerequisite of numerous comparative genomics procedures, such as function prediction, construction of phylogenetic species trees and genome rearrangement analysis. We present an algorithm for the detection of non-orthologs that arise by mistake in current orthology classification methods based on genome-specific best hits, such as the COGs database. The algorithm works with pairwise distance estimates, rather than computationally expensive and error-prone tree-building methods. The accuracy of the algorithm is evaluated through verification of the distribution of predicted cases, case-by-case phylogenetic analysis and comparisons with predictions from other projects using independent methods. Our results show that a very significant fraction of the COG groups include non-orthologs: using conservative parameters, the algorithm detects non-orthology in a third of all COG groups. Consequently, sequence analysis sensitive to correct orthology assignments will greatly benefit from these findings

Crossref

PubMed Central

UCL Discovery

Estimates of Positive Darwinian Selection Are Inflated by Errors in Sequencing, Annotation, and Alignment

Author: Adrian Schneider
Alexander Souvorov
Anisimova
Arbiza
Bakewell
Cannarozzi
Clark
Dan Graur
Dessimoz
Gaston H. Gonnet
Gibbs
Giddy Landan
Gonnet
Gonnet
Hill
Hubbard
Hughes
Jorgensen
Kosiol
Landan
Li
Murphy
Niv Sabath
Rom
Schneider
Studer
Yang
Zhang
Publication venue: Oxford University Press
Publication date: 01/01/2009
Field of study

Published estimates of the proportion of positively selected genes (PSGs) in human vary over three orders of magnitude. In mammals, estimates of the proportion of PSGs cover an even wider range of values. We used 2,980 orthologous protein-coding genes from human, chimpanzee, macaque, dog, cow, rat, and mouse as well as an established phylogenetic topology to infer the fraction of PSGs in all seven terminal branches. The inferred fraction of PSGs ranged from 0.9% in human through 17.5% in macaque to 23.3% in dog. We found three factors that influence the fraction of genes that exhibit telltale signs of positive selection: the quality of the sequence, the degree of misannotation, and ambiguities in the multiple sequence alignment. The inferred fraction of PSGs in sequences that are deficient in all three criteria of coverage, annotation, and alignment is 7.2 times higher than that in genes with high trace sequencing coverage, “known” annotation status, and perfect alignment scores. We conclude that some estimates on the prevalence of positive Darwinian selection in the literature may be inflated and should be treated with caution

Repository for Publications and Research Data

Crossref

PubMed Central

The OMA orthology database in 2015: function predictions, better plant support, synteny view and other improvements

Author: Altenhoff Adrian M.
Dessimoz Christophe
Glover Natasha
Gonnet Gaston H.
Gori Kevin
Müller Steven
Piližota Ivana
Redestig Henning
Sueki Anna
Tomiczek Bartlomiej
Train Clément-Marie
Škunca Nives
Publication venue
Publication date: 02/08/2017
Field of study

The Orthologous Matrix (OMA) project is a method and associated database inferring evolutionary relationships amongst currently 1706 complete proteomes (i.e. the protein sequence associated for every protein-coding gene in all genomes). In this update article, we present six major new developments in OMA: (i) a new web interface; (ii) Gene Ontology function predictions as part of the OMA pipeline; (iii) better support for plant genomes and in particular homeologs in the wheat genome; (iv) a new synteny viewer providing the genomic context of orthologs; (v) statically computed hierarchical orthologous groups subsets downloadable in OrthoXML format; and (vi) possibility to export parts of the all-against-all computations and to combine them with custom data for ‘client-side' orthology prediction. OMA can be accessed through the OMA Browser and various programmatic interfaces at http://omabrowser.or

Repository for Publications and Research Data

RERO DOC Digital Library

Fast estimation of the difference between two PAM/JTT evolutionary distances in triplets of homologous sequences

Author: A Wagner
Adrian Schneider
B Chor
C Dessimoz
C Dessimoz
C Seoighe
Christophe Dessimoz
DL Swofford
DT Jones
ET Dermitzakis
G Blanc
Gaston H Gonnet
GC Conant
GH Gonnet
GH Gonnet
GH Gonnet
GM Cannarozzi
J Felsenstein
J Felsenstein
LB Koski
M Bulmer
M Hasegawa
M Kellis
Manuel Gil
MO Dayhoff
N Goldman
S Ohno
T Jukes
T Muller
TF DeLuca
Y Van de Peer
YJ Li
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: The estimation of the difference between two evolutionary distances within a triplet of homologs is a common operation that is used for example to determine which of two sequences is closer to a third one. The most accurate method is currently maximum likelihood over the entire triplet. However, this approach is relatively time consuming. RESULTS: We show that an alternative estimator, based on pairwise estimates and therefore much faster to compute, has almost the same statistical power as the maximum likelihood estimator. We also provide a numerical approximation for its variance, which could otherwise only be estimated through an expensive re-sampling approach such as bootstrapping. An extensive simulation demonstrates that the approximation delivers precise confidence intervals. To illustrate the possible applications of these results, we show how they improve the detection of asymmetric evolution, and the identification of the closest relative to a given sequence in a group of homologs. CONCLUSION: The results presented in this paper constitute a basis for large-scale protein cross-comparisons of pairwise evolutionary distances

Repository for Publications and Research Data

Crossref

Springer - Publisher Connector

PubMed Central

UCL Discovery

The OMA orthology database in 2018: retrieving evolutionary relationships among all domains of life through richer web and programmatic interfaces.

Author: Adrian M Altenhoff
Alex Warwick Vesztrocy
Charles Stevenson
Christophe Dessimoz
Clément-Marie Train
David Dylus
Gaston H Gonnet
Henning Redestig
Jiao Long
Karina Zile
Klara Kaleb
Natasha M Glover
Tarcisio M de Farias
The UniProt Consortium
Publication venue: 'Oxford University Press (OUP)'
Publication date: 27/10/2017
Field of study

The Orthologous Matrix (OMA) is a leading resource to relate genes across many species from all of life. In this update paper, we review the recent algorithmic improvements in the OMA pipeline, describe increases in species coverage (particularly in plants and early-branching eukaryotes) and introduce several new features in the OMA web browser. Notable improvements include: (i) a scalable, interactive viewer for hierarchical orthologous groups; (ii) protein domain annotations and domain-based links between orthologous groups; (iii) functionality to retrieve phylogenetic marker genes for a subset of species of interest; (iv) a new synteny dot plot viewer; and (v) an overhaul of the programmatic access (REST API and semantic web), which will facilitate incorporation of OMA analyses in computational pipelines and integration with other bioinformatic resources. OMA can be freely accessed at https://omabrowser.org

Repository for Publications and Research Data

Crossref

Serveur académique lausannois

UCL Discovery

Algorithm of OMA for large-scale orthology inference

Author: A Alexeyenko
A Bateman
A Schneider
AC Berglund-Sonnhammer
AK Bjorklund
Alexander CJ Roth
AM Altenhoff
AR Mushegian
C Dessimoz
C Dessimoz
C Dessimoz
CEV Storm
Christophe Dessimoz
CM Zmasek
D Fulton
DA Benson
DP Wall
ELL Sonnhammer
Gaston H Gonnet
K Chen
L Jensen
L Li
M Dayhoff
M Farrar
M Gil
M Remm
P Flicek
R Balasubramanian
RA Notebaart
RL Tatusov
RL Tatusov
RTJMvan der Heijden
TF DeLuca
TF Smith
WM Fitch
Publication venue: BioMed Central
Publication date: 01/12/2008
Field of study

Since the publication of our article (Roth, Gonnet, and Dessimoz: BMC Bioinformatics 2008 9: 518), we have noticed several errors, which we correct in the following

Repository for Publications and Research Data

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

UCL Discovery